Learning Rate Adaptation in Stochastic Gradient Descent
نویسنده
چکیده
The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization methods are employed, however, in several cases, significant training speed and alleviation of the local minima problem can be achieved when stochastic minimization methods are used. In this paper a method for adapting the learning rate in stochastic gradient descent is presented. The main feature of the proposed learning rate adaptation scheme is that it exploits gradient– related information from the current as well as the two previous pattern presentations. This seems to provide some kind of stabilization in the value of the learning rate and helps the stochastic gradient descent to exhibit fast convergence and a high rate of success. Tests in various problems validate the above mentioned characteristics of the new algo-
منابع مشابه
Chapter 2 LEARNING RATE ADAPTATION IN STOCHASTIC GRADIENT DESCENT
The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization metho...
متن کاملCoupling Adaptive Batch Sizes with Learning Rates
Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple empirical inspection. The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance o...
متن کاملMicroscale Adaptive Optics: Wave-Front Control with a mu-Mirror Array and a VLSI Stochastic Gradient Descent Controller.
The performance of adaptive systems that consist of microscale on-chip elements [microelectromechanical mirror (mu-mirror) arrays and a VLSI stochastic gradient descent microelectronic control system] is analyzed. The mu-mirror arrays with 5 x 5 and 6 x 6 actuators were driven with a control system composed of two mixed-mode VLSI chips implementing model-free beam-quality metric optimization by...
متن کاملIdentification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network
Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...
متن کاملOnline Learning Rate Adaptation with Hypergradient Descent
We introduce a general method for improving the convergence rate of gradientbased optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the man...
متن کامل